intermediate output
EasyToHard
A.1 Datasets Details of the datasets we introduce are presented in this section. Specific details about generation as well as statistics from the resulting datasets are delineated for each one below. A.1.1 Prefix sum data Binary string inputs of length n are generated by selecting a random integer in [0, 2 Datasets are produced by repeating this random process 10,000 times without replacement. Because the number of possible points increases exponentially as a function of n and the size of the generated dataset is fixed, it is important to note that the dataset becomes sparser in its ambient hypercube as n increases. Moreover, we are limited to datasets with binary strings of length n>13 to avoid duplicate data points.
Memory- and Latency-Constrained Inference of Large Language Models via Adaptive Split Computing
Sung, Mingyu, Palakonda, Vikas, Im, Suhwan, Moon, Sunghwan, Kim, Il-Min, Yun, Sangseok, Kang, Jae-Mo
Large language models (LLMs) have achieved near-human performance across diverse reasoning tasks, yet their deployment on resource-constrained Internet-of-Things (IoT) devices remains impractical due to massive parameter footprints and memory-intensive autoregressive decoding. While split computing offers a promising solution by partitioning model execution between edge devices and cloud servers, existing approaches fail to address the unique challenges of autoregressive inference, particularly the iterative token generation process and expanding key-value (KV) cache requirements. This work introduces the first autoregressive-aware split computing framework designed explicitly for LLM deployment on edge devices. Our approach makes three key contributions. First, we develop one-point split compression (OPSC), a mixed-precision quantization scheme that prevents out-of-memory failures by strategically partitioning models into front-end and back-end segments with different precision levels. Second, we propose a two-stage intermediate compression pipeline that combines threshold splitting (TS) and token-wise adaptive bit quantization (TAB-Q) to preserve accuracy-critical activations while dramatically reducing communication overhead. Third, we formulate a unified optimization framework that jointly selects optimal split points, quantization settings, and sequence lengths to satisfy strict memory and latency constraints. Extensive evaluations across diverse LLMs and hardware platforms demonstrate superior performance compared to state-of-the-art quantization methods, including SmoothQuant, OmniQuant, and Atom. The framework achieves a 1.49 inference speedup and significant communication overhead reduction while maintaining or improving model accuracy.
- North America > Canada (0.04)
- Asia > South Korea > Daegu > Daegu (0.04)
- Asia > South Korea > Busan > Busan (0.04)
RAGCap-Bench: Benchmarking Capabilities of LLMs in Agentic Retrieval Augmented Generation Systems
Lin, Jingru, Zhang, Chen, Liu, Stephen Y., Li, Haizhou
Retrieval-Augmented Generation (RAG) mitigates key limitations of Large Language Models (LLMs)-such as factual errors, outdated knowledge, and hallucinations-by dynamically retrieving external information. Recent work extends this paradigm through agentic RAG systems, where LLMs act as agents to iteratively plan, retrieve, and reason over complex queries. However, these systems still struggle with challenging multi-hop questions, and their intermediate reasoning capabilities remain underexplored. To address this, we propose RAGCap-Bench, a capability-oriented benchmark for fine-grained evaluation of intermediate tasks in agentic RAG workflows. We analyze outputs from state-of-the-art systems to identify common tasks and the core capabilities required for their execution, then construct a taxonomy of typical LLM errors to design targeted evaluation questions. Experiments show that "slow-thinking" models with stronger RAGCap performance achieve better end-to-end results, underscoring the benchmark's validity and the importance of enhancing these intermediate capabilities.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (2 more...)
From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens
Sheta, Hala, Huang, Eric, Wu, Shuyu, Alenabi, Ilia, Hong, Jiajun, Lin, Ryker, Ning, Ruoxi, Wei, Daniel, Yang, Jialin, Zhou, Jiawei, Ma, Ziqiao, Shi, Freda
We introduce VLM-Lens, a toolkit designed to enable systematic benchmarking, analysis, and interpretation of vision-language models (VLMs) by supporting the extraction of intermediate outputs from any layer during the forward pass of open-source VLMs. VLM-Lens provides a unified, YAML-configurable interface that abstracts away model-specific complexities and supports user-friendly operation across diverse VLMs. It currently supports 16 state-of-the-art base VLMs and their over 30 variants, and is extensible to accommodate new models without changing the core logic. The toolkit integrates easily with various interpretability and analysis methods. We demonstrate its usage with two simple analytical experiments, revealing systematic differences in the hidden representations of VLMs across layers and target concepts. VLM-Lens is released as an open-sourced project to accelerate community efforts in understanding and improving VLMs.
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Oops!... They Stole it Again: Attacks on Split Learning
Khan, Tanveer, Michalas, Antonis
Split Learning (SL) is a collaborative learning approach that improves privacy by keeping data on the client-side while sharing only the intermediate output with a server. However, the distributed nature of SL introduces new security challenges, necessitating a comprehensive exploration of potential attacks. This paper systematically reviews various attacks on SL, classifying them based on factors such as the attacker's role, the type of privacy risks, when data leaks occur, and where vulnerabilities exist. We also analyze existing defense methods, including cryptographic methods, data modification approaches, distributed techniques, and hybrid solutions. Our findings reveal security gaps, highlighting the effectiveness and limitations of existing defenses. By identifying open challenges and future directions, this work provides valuable information to improve SL privacy issues and guide further research.
- Europe > Finland > Pirkanmaa > Tampere (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > New Finding (0.48)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- (2 more...)
EasyToHard
Details of the datasets we introduce are presented in this section. The maze data is generated using a depth first search algorithm. The algorithm is available in the attached code. Figure 10: Example of small (left) and large(right) maze inputs and targets. The puzzle data is released by Lichess for public use under the Creative Commons CC0 license.
Towards an Optimal Control Perspective of ResNet Training
Püttschneider, Jens, Heilig, Simon, Fischer, Asja, Faulwasser, Timm
We propose a training formulation for ResNets reflecting an optimal control problem that is applicable for standard architectures and general loss functions. We suggest bridging both worlds via penalizing intermediate outputs of hidden states corresponding to stage cost terms in optimal control. For standard ResNets, we obtain intermediate outputs by propagating the state through the subsequent skip connections and the output layer. We demonstrate that our training dynamic biases the weights of the unnecessary deeper residual layers to vanish. This indicates the potential for a theory-grounded layer pruning strategy.
VeriTrail: Closed-Domain Hallucination Detection with Traceability
Metropolitansky, Dasha, Larson, Jonathan
Even when instructed to adhere to source material, Language Models often generate unsubstantiated content - a phenomenon known as "closed-domain hallucination." This risk is amplified in processes with multiple generative steps (MGS), compared to processes with a single generative step (SGS). However, due to the greater complexity of MGS processes, we argue that detecting hallucinations in their final outputs is necessary but not sufficient: it is equally important to trace where hallucinated content was likely introduced and how faithful content may have been derived from the source through intermediate outputs. To address this need, we present VeriTrail, the first closed-domain hallucination detection method designed to provide traceability for both MGS and SGS processes. We also introduce the first datasets to include all intermediate outputs as well as human annotations of final outputs' faithfulness for their respective MGS processes. We demonstrate that VeriTrail outperforms baseline methods on both datasets.
- Law (0.67)
- Energy > Renewable (0.46)
- Government > Regional Government (0.46)
- Information Technology > Services (0.45)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Intermediate Outputs Are More Sensitive Than You Think
Huang, Tao, Huang, Qingyu, Meng, Jiayang
The increasing reliance on deep computer vision models that process sensitive data has raised significant privacy concerns, particularly regarding the exposure of intermediate results in hidden layers. While traditional privacy risk assessment techniques focus on protecting overall model outputs, they often overlook vulnerabilities within these intermediate representations. Current privacy risk assessment techniques typically rely on specific attack simulations to assess risk, which can be computationally expensive and incomplete. This paper introduces a novel approach to measuring privacy risks in deep computer vision models based on the Degrees of Freedom (DoF) and sensitivity of intermediate outputs, without requiring adversarial attack simulations. We propose a framework that leverages DoF to evaluate the amount of information retained in each layer and combines this with the rank of the Jacobian matrix to assess sensitivity to input variations. This dual analysis enables systematic measurement of privacy risks at various model layers. Our experimental validation on real-world datasets demonstrates the effectiveness of this approach in providing deeper insights into privacy risks associated with intermediate representations.
SVIP: Towards Verifiable Inference of Open-source Large Language Models
Sun, Yifan, Li, Yuhang, Zhang, Yue, Jin, Yuchen, Zhang, Huan
Open-source Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language understanding and generation, leading to widespread adoption across various domains. However, their increasing model sizes render local deployment impractical for individual users, pushing many to rely on computing service providers for inference through a blackbox API. This reliance introduces a new risk: a computing provider may stealthily substitute the requested LLM with a smaller, less capable model without consent from users, thereby delivering inferior outputs while benefiting from cost savings. Existing verifiable computing solutions based on cryptographic or game-theoretic techniques are either computationally uneconomical or rest on strong assumptions. By training a proxy task on these outputs and requiring the computing provider to return both the generated text and the processed intermediate outputs, users can reliably verify whether the computing provider is acting honestly. In addition, the integration of a secret mechanism further enhances the security of our protocol. We thoroughly analyze our protocol under multiple strong and adaptive adversarial scenarios. In recent years, Large Language Models (LLMs) have achieved unprecedented success across a broad array of tasks and domains (Achiam et al., 2023; Dubey et al., 2024; Yang et al., 2024). Alongside this progress, open-source LLMs have proliferated, offering increasingly sophisticated and capable models to the broader research community (Touvron et al., 2023b; Black et al., 2022; Le Scao et al., 2023; Jiang et al., 2023; Almazrouei et al., 2023; Zhang et al., 2023). Many of these open-source LLMs now rival, or even surpass, their closed-source counterparts in performance (Chiang et al., 2023; Almazrouei et al., 2023; Dubey et al., 2024), while remaining freely accessible. However, as model capacity grows, it typically comes with a corresponding increase in the number of model parameters, which directly drives up the computational demands, particularly in terms of memory and processing power (Kukreja et al., 2024).
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > The Bahamas (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)